AITopics | overestimation bias

cfa45151ccad6bf11ea146ed563f2119-Supplemental.pdf

Neural Information Processing SystemsApr-27-2026, 04:26:25 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.38)

Add feedback

Regularized Softmax Deep Multi-Agent Q-Learning

Neural Information Processing SystemsApr-24-2026, 14:57:35 GMT

Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multiagent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

Add feedback

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Neural Information Processing SystemsMar-22-2026, 02:36:31 GMT

Training offline RL models using visual inputs poses two significant challenges,, the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the " " for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

cfa45151ccad6bf11ea146ed563f2119-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 06:35:52 GMT

approximation error, ensemble size, estimation bias, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.37)

Add feedback

cfa45151ccad6bf11ea146ed563f2119-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 06:35:48 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

ContinuousDoublyConstrainedBatch ReinforcementLearning

Neural Information Processing SystemsFeb-8-2026, 21:59:48 GMT

Thelimited datainbatchRLproduces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

0a113ef6b61820daa5611c870ed8d5ee-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 10:34:42 GMT

overestimation bias, softmax operator, value estimate, (11 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning

Neural Information Processing SystemsDec-26-2025, 19:49:26 GMT

Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data. In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective. By introducing a novel regularization loss for Q-ensemble independence based on random matrix theory, we propose spiked Wishart Q-ensemble independence regularization (SPQR) for reinforcement learning. Specifically, we modify the intractable hypothesis testing criterion for the Q-ensemble independence into a tractable KL divergence between the spectral distribution of the Q-ensemble and the target Wigner's semicircle distribution. We implement SPQR in several online and offline ensemble Q-learning algorithms. In the experiments, SPQR outperforms the baseline algorithms in both online and offline RL benchmarks.

controlling q-ensemble independence, name change, spiked random model, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

cfa45151ccad6bf11ea146ed563f2119-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 11:42:53 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

To Reviewer # 3: Thank you for your careful reading and thoughtful reviews

Neural Information Processing SystemsAug-15-2025, 00:27:33 GMT

T o Reviewer #3: Thank you for your careful reading and thoughtful reviews. Q1: Theorems 3 and 4. (i) Theorem 3: Theorem 3 shows that SD2 helps to reduce the overestimation bias compared We empirically show that SD2 does not underestimate and can reduce the absolute bias in Figure 4. The left-hand side in Eq. (19) equals to Q5: How is the performance of the proposed approximation method? We will try to further investigate it in future research. Q2: Related works about ensemble methods.

action space, careful reading and thoughtful review, reviewer, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Filters

Collaborating Authors

overestimation bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

cfa45151ccad6bf11ea146ed563f2119-Supplemental.pdf

Regularized Softmax Deep Multi-Agent Q-Learning

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

cfa45151ccad6bf11ea146ed563f2119-Supplemental.pdf

cfa45151ccad6bf11ea146ed563f2119-Paper.pdf

ContinuousDoublyConstrainedBatch ReinforcementLearning

0a113ef6b61820daa5611c870ed8d5ee-Paper.pdf

SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning

cfa45151ccad6bf11ea146ed563f2119-Paper.pdf

To Reviewer # 3: Thank you for your careful reading and thoughtful reviews